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ABSTRACT 

In the context of the CNOCl cluster survey, redshifts were obtained for galaxies 
in 16 clusters. The resulting sample is ideally suited for an analysis of the internal 
velocity and mass distribution of clusters. Previous analyses of this dataset used the 
Jeans equation to model the projected velocity dispersion profile. However, the results 
of such an analysis always yield a strong degeneracy between the mass density profile 
and the velocity dispersion anisotropy profile. Here we analyze the full (R, v) dataset 
of galaxy positions and velocities in an attempt to break this degeneracy. 

We build an 'ensemble cluster' from the individual clusters under the assumption 
that they form a homologous sequence; if clusters are not homologous then our 
results are probably still valid in an average sense. To interpret the data we study a 
one-parameter family of spherical models with different constant velocity dispersion 
anisotropy, chosen to all provide the same acceptable fit to the projected velocity 
dispersion profile. The best-fit model is sought using a variety of statistics, including 
the likelihood of the dataset, and the shape and Gauss-Hermite moments of the 
grand-total velocity histogram. The confidence regions and goodness-of-fit for the 
best-fit model are determined using Monte-Carlo simulations. Although the results of 
our analysis depend slightly on which statistic is used to judge the models, all statistics 
agree that the best-fit model is close to isotropic. For none of the statistics does the 
1-a confidence region extend below (Jr/crt = 0.74, or above ar/ct = 1.05. This result 
derives primarily from the fact that the observed grand-total velocity histogram is close 
to Gaussian, which is not expected to be the case for a strongly anisotropic model. 

The best-fitting models have a mass-to-number-density ratio that is approximately 
independent of radius over the range constrained by the data. They also have a 
mass-density profile that is consistent with the dark matter halo profile advocated by 
Navarro, Frenk & White, in terms of both the profile shape and the characteristic scale 
length. This adds important new weight to the evidence that clusters do indeed follow 
this proposed universal mass density profile. 

We present a detailed discussion of a number of possible uncertainties in our 
analysis, including our treatment of interlopers and brightest cluster galaxies, our use of 
a restricted one-parameter family of distribution functions, our use of spherical models 
for what is in reality an ensemble of non-spherical clusters, and our assumption that 
clusters form a homologous set. These issues all constitute important approximations 
in our analysis. However, none of the tests that we have done indicates that these 
approximations infiuence our results at a significant level. 



Subject headings: dark matter — galaxies: clusters: general — galaxies: kinematics 
and dynamics. 



1. Introduction 

Determinations of the internal velocity and mass distribution of galaxy clusters are of great 
value, since they have the potential to constrain both the main cosmological parameters and 
scenarios for large-scale structure formation (e.g., Crone, Evrard & Richstone 1994; Cole & Lacey 
1996; de Theije, van Kampen & Slijkhuis 1998, 1999). A recent development on this problem has 
been the prediction from cosmological simulations that dark matter halos have a universal density 
profile (Navarro, Frenk & White 1997; hereafter NFW). Testing the validity of this prediction 
is an important goal for any study of cluster structure. The traditional way to study these 
issues is to construct dynamical equilibrium models for the redshift measurements of individual 
cluster galaxies. Alternative approaches to infer the mass distribution of clusters are to use X-ray 
observations (e.g., Allen 1998; Hughes 1998), measurements of the weak- or strong-lensing of 
background sources (e.g., Bartelmann & Narayan 1995), or caustics in redshift space (Geller, 
Diaferio & Kurtz 1999; Diaferio 1999). Each of the different methods has its own uncertainties and 
biases, and results from different methods are often found to differ. The causes and magnitudes of 
these differences remain a hot topic of debate (e.g., Lewis et al. 1999 and references therein). 

Here we focus attention on the dynamical analysis of galaxy redshiftsj^ Analyses of this 
type have nearly always been restricted to modeling of the projected velocity dispersion profile 
a(R) (e.g., Kent & Gunn 1982; Merritt 1987), often using the Jeans equation (e.g., Solanes & 
Salavdor-Sole 1990; den Hartog & Katgert 1996; Natarajan &; Kneib 1996). The main limitation 
of such analyses is that there are two unknown functions of one variable, the mass density profile 
p{r) and the velocity dispersion anisotropy profile (3{r), that are both constrained by only one 
function of one variable, a{R). As a result, there is always a degenerate set of models with different 
p(r) and f3{r) that can all fit the observations equally well (Binney &: Mamon 1982). Attempts 
are often made to break this degeneracy by assuming that either one of p{r) or (3{r) is known, 
so that the other can be determined. Popular assumptions are that the velocity distribution is 
isotropic, or that the mass distribution can be calculated from the observed number density by 
the assumption of a constant mass-to-number density (or mass-to-light) ratio. However, neither 
of these assumptions has a strong physical justification, so these approaches do not remove the 
underlying degeneracy. 

As a result of steady improvements in instrumentation, and in particular the advent of efficient 
multi-object spectrographs, the available redshift samples for galaxy clusters have been steadily 
increasing in size. Important recent datasets are the ENACS (ESO Nearby Abell Cluster Survey) 
(e.g., Katgert et al. 1996) and the CNOCl (Canadian Network for Observational Cosmology) 
cluster redshift survey^ (e.g., Yee, Ellingson & Carlberg 1996). With improved statistics it should 



^Dynamical models of clusters of galaxies based on the Boltzmann equation (or its integrals, the Jeans equations 
and the virial theorem) always assume implicitly that the system is in equilibrium. While this is not strictly true for 
a cluster of galaxies, numerical simulations show that cluster evolution proceeds through a series of quasi-equilibrium 
states that satisfy the Boltzmann equation in an approximate sense (Natarajan, Hjorth, & van Kampen 1997). 

^We refer to the CNOC cluster redshift survey as 'CNOCl', to distinguish it from the subsequent CNOC2 field 
galaxy redshift survey (e.g., Lin et al. 1999). 



become possible to extract not only velocity dispersion profiles from the data, but also higher 
order moments that describe deviations of the observed velocity histograms from a Gaussian shape 
(Zabludoff, Franx &: Geller 1993). Such measurements have the potential to break the degeneracy 
between the mass and velocity distribution, as emphasized in this context by, e.g., Merritt (1987; 
1993) and Merritt & Saha (1993). 

Here we focus on the data from the CNOCl cluster survey. Redshifts were obtained for 
galaxies in 16 clusters at z = 0.17-0.55, selected on the basis of their X-ray luminosity. Previous 
analyses of these data were all based either on use of the virial theorem (Carlberg et al. 1996; 
hereafter C96) or the Jeans equation (Carlberg et al. 1997a,b,c; hereafter C97a,b,c). These 
analyses yielded the important result that if cluster of galaxies have a velocity distribution 
that is not too far from isotropic, then the CNOCl data are consistent with clusters having 
an approximately constant mass-to- number-density ratio, and a mass-density profile that is 
approximately of the NFW form with a scale radius that is consistent with predictions from 
cosmological simulations. However, because of the degeneracy intrinsic to the Jeans equation, 
models with significant anisotropics and very different mass profiles can fit the projected velocity 
dispersion profile equally well. So the important question is: can the range of allowed models for 
the CNOCl data be reduced by considering not only the projected velocity dispersion profile, but 
also the entire {R, v) dataset and the implied higher-order velocity moments? This question is the 
main topic of the present paper. To answer it, we present a new analysis of the CNOCl dataset, 
in part similar in spirit to that suggested by Merritt & Saha (1993), but with some differences (see 
§0 below). 



In §^ we first repeat some of the Jeans modeling of cr{R), but using a different approach 
from that employed by C97a,b,c. The results not only provide an illustration of the degeneracies 
involved in the modeling, but also serve as a useful starting point for the more detailed analysis. 
In §^ we analyze the full {R,v) dataset. We seek a best-fit model using a variety of statistics, 
including the likelihood of the dataset and the shape and Gauss-Hermite moments of the 
grand-total velocity histogram. The confidence regions and goodness-of-fit for the best-fit model 
are estimated using Monte-Carlo simulations. In §0 we discuss possible uncertainties in our 
analysis, and how robust the conclusions of our analysis are in view of these uncertainties. In §0 
we present and discuss the final conclusions. In Appendix A we compare our modeling approach 
to that of Ramirez & De Souza (1998) and Ramirez, De Souza & Schade (1999), who recently used 
a more approximate method to constrain the orbital anisotropy of cluster galaxies from observed 
cluster velocity histograms (including that for the CNOCl sample). 



2. Jeans equation models 

In the CNOCl cluster survey redshifts were obtained for galaxies in 16 different clusters. The 
characteristic size and velocity of each cluster can be quantified by, e.g., the radius r2oo inside 
which the average mass density equals 200 times the critical density of the Universe (at the given 
redshift), and the line-of-sight velocity dispersion a. For our dynamical analysis we treat the data 
similarly as in C97b: we assume that clusters form a homologous sequence, each with identical 



structure in dimensionless units. To study this dimensionless structure we consider the dataset 
that contains for each galaxy in the survey the quantities R = R/r2oo and v = v/a, where R is 
the projected distance from the cluster center, v is the observed line-of-sight velocity with respect 
to the systemic cluster velocity, and r2oo and a are taken from the analysis of C97b. One may 
consider the {R,v) as quantities drawn from 'the ensemble cluster'. The advantage of combining 
data from different clusters is that it reduces the influence of, e.g., substructure and non-sphericity 
on the properties of the dataset. With this approach, spherical equilibrium models are likely to 



provide an adequate description of the data (see §4.4 and §E^ below). 



Hereafter we write {R,v) instead of {R,v), with the understanding that all quantities 
discussed are dimensionless unless otherwise noted. In our dynamical analysis we also work in 
dimensionless units. To this end we adopt a unit of mass 

Mu = (2.3252 X 10^"^ Mq) (raoo/ Mpc) (cj/[10^ kms"^])^ (1) 

chosen to set the gravitational constant to G = 1. 

The projected galaxy number density profile S(i?) of the CNOCl ensemble cluster was 
derived in C97b, with corrections for both the non-uniform sampling of the clusters (see Yee et 
al. 1996) and interloper contamination. Figure || shows the result; S is the number of galaxies per 
unit area with K-corrected absolute Gunn r-band magnitude Mr < —18.5,0 normalized to unity 
over the circular region with radius r2oo- For the dynamical modeling we have fitted to the data a 
smooth function of the form 

^(R) = S, 2^ {R/b)~^ [I + (R/brr^ [1 + {R/c)^]-"^ . (2) 

This somewhat arbitrary parameterization is similar to the 'nuker-law' advocated by, e.g., Byun 
et al. (1996), but is more general in having two power-law breaks rather than one. A good fit was 
obtained with Sf, = 1.03, b = 0.21, c = 1.32, a = 2.97, /3 = 1.51, 7 = 0.65, S = 4.00, and e = 3.00. 
The parameterized fit is shown in Figure |^. The fit to the data is statistically acceptable, but note 
that the fit is not unique; the parameters of equation (^) are strongly correlated, and are not all 
equally well constrained. C97c showed that the projection of a profile such as that advocated by 
NFW also provides an acceptable fit to the data. 

To interpret the dynamics of the ensemble cluster we first model the projected velocity 
dispersion profile using the Jeans equation for a spherical system. The software used for this was 
similar to that discussed in van der Marel (1994). The intrinsic galaxy number density profile 
z^(r) follows from the projected galaxy number density profile 5](i?) by solution of the relevant 
Abel transform equation. An ansatz is made for the mass density profile p{r) of the cluster, and 
from this the gravitational potential and gravitational force are calculated. The Jeans equation is 
then solved for the intrinsic velocity dispersions of the system (the tangential velocity dispersion 
ct = <70 = a^j, and the radial velocity dispersion ar), for some velocity dispersion anisotropy profile 
/3(r), where /3 = 1 — af /a^. The intrinsic velocity dispersion components in the direction to the 



The value of the absolute magnitude limit assumes a Hubble constant Ho = 100 km s Mpc ^ . This is the only 
quantity in this paper that depends on Ho, because our entire analysis is performed in dimensionless units. 



observer are then weighted with the intrinsic galaxy number density and projected along the line 
of sight to yield the projected line-of-sight velocity dispersion profile cr{R), which can be compared 
to observations. 

As discussed in §||, there is always a degenerate set of models with different p{r) and /3(r) that 
all fit cr{R) equally well. The goal of the present paper is to see to what extent this degeneracy can 
be broken by modeling not only the projected velocity dispersion profile, but also the individual 
{R,v) data points (and thus indirectly the higher order velocity moments). We do not wish to be 
too ambitious and therefore restrict ourselves to a one-parameter family of models, namely those 
in which the velocity dispersion anisotropy is independent of radius, f3{r) = constant. There is a 
unique mass density profile p{r) for each constant /?, such that models with different (3 all provide 
an identical fit to the observations. Instead of attempting to infer non-par ametrically the optimum 
p(r) for each /?, we adopt a simple three-parameter family of models among which we seek the one 
that fits best. We set 

p{r)=po{r/a)-^[l + ir/a)]^-\ (3) 

The parameters po and a set the scale of the mass distribution in density and length, while ^ is the 
logarithmic power-law slope near the center. For ^^ = 1 this mass density is of the form advocated 
by NFW; for ^^ = the mass density has a homogeneous core. 

The projected velocity dispersion profile cr{R) of the CNOCl ensemble cluster was derived in 
C97b, with appropriate correction for interloper contamination. The dispersion profile that we 
have used in our analysis was obtained with a similar but slightly updated treatment of the data, 
and is shown in Figure |2[ The inferred profile is mildly different from that analyzed in C97b, but is 
equivalent in a statistical sense. Eleven constant-/? models were constructed to interpret the data, 
with f3 chosen such that the ratio dr/o't was sampled logarithmically between 1/3 and 3. For each 
/3 we determined the p{r) for which the model predictions best fit the data. To this end we used a 
grid in the (a, ^) parameter space. For each (a, ^) we calculated the shape of the predicted velocity 
dispersion profile, and the density normalization pQ that yields the best fit to the observed velocity 
dispersion profile in a x^ sense. We then sought the minimum x^ over the (a, ^) parameter space 
to find the best-fitting mass density profile p{r) for the given (3. The velocity dispersion profiles 
predicted by these best-fit models are shown as curves in Figure ^. Apart from their small radii 
behavior (where there are no strong constraints from the data) the model predictions are similar 
for the different values of /3, as expected based on the degeneracy of the problem. The small 
differences between the various models are due to the fact that a parameterized form was used 
for the mass density p{r); the true p(r) that best fits the data may not be exactly of the adopted 
form. However, the fits to the data in Figure are all statistically acceptable, as judged by the 
X^ of the fit. So even though a non-parameterized modeling approach would likely yield mass 
densities p{r) that differ from the parameterized form adopted here, the differences would not be 
statistically significant. This provides a posteriori justification for the use of the adopted mass 
density parameterization. 

Figure |3| shows the parameters po, a and ^ of the best-fitting mass density p{r) as a function 
of the anisotropy Urjcrt- Solid dots indicate the models for which the predictions are shown in 
Figure pi. The profiles p(r^ for these models are shown in the top right panel of Figure H. As the 



value of (Jr/cTt is increased, the scale radius a increases while the small radii slope ^ decreases. 
Consequently, tangentially anisotropic models have higher mass densities and steeper mass density 
profiles at small radii than radially anisotropic models. Very radially anisotropic models formally 
attain their best fit for ^ < 0, i.e., with decreasing mass densities near the center. This seems 
implausible for various reasons, and we therefore fixed ^ = for these models. This has little effect 
on the quality of the fit to the observed velocity dispersion profile, cf. Figure |2|. 

The bottom right panel of Figure H shows the enclosed mass profile M{r) for the models. The 
profile shapes depend strongly on the anisotropy, with tangentially anisotropic models being more 
centrally concentrated than radially anisotropic models. Interestingly, M2005 the enclosed mass 
within r2oo (log(?^) = in dimensionless units), is virtually independent of the assumed anisotropy.F| 
This was noticed previously by C97b, and significantly reduces the uncertainty in estimates of the 
cosmological mass density VL from cluster mass-to-light ratios using Oort's method. 

The top left panel of Figure ^ shows the number density profile u{r) of the models, as obtained 
by deprojection of the projected profile shown in Figure |l[ When combined with the mass density 
profile p{r) for each model one obtains the mass-to-number-density ratio p/u[r), which is shown 
in the bottom left panel of Figure ^. This ratio decreases with radius for tangentially anisotropic 
models, and increases with radius for radially anisotropic models. The isotropic model (heavy 
curve) has a mass-to-number-density ratio p/u{r) that is very nearly independent of radius. Note 
that the logarithmic slopes of the adopted ^{r) and p{r) differ at asymptotically small and large 
radii (for r — > 0: z^(r) oc r~^'^^ and p{r) oc r~^; for r — > 00: v{r) oc r~'^ and p{r) oc r~^). So it is not 
an intrinsic property of the adopted parameterizations that isotropic models have approximately 
constant p/v{r) over the region where this quantity is constrained by the data. 

If the galaxy luminosity function of clusters is independent of position, then the mass-to-light 
ratio M/L is directly proportional to the mass-to-number-density ratio p/v{r). In physical units, 
M/L is given by the dimensionless function p/v{r) shown in Figure ^, multiplied by Mu/L2oo- 
Here M^ is as given in equation (|l|), and I/200 is the total luminosity inside a projected radius of 
r2oo- The quantities r200) f and L200 foi' the individual clusters of the CNOCl redshift survey are 
listed in Table 4 of C96 and Table 1 of C97b. 

NFW have argued that cosmological simulations yield a more-or-less universal mass density 
profile p{r) for dark matter halos, given by equation (S) with ^^ = 1. Our models generate such a 
profile if log(o"r/(T() = —0.06, i.e., (Jr/(Tt ~ 0.87 (cf. Figure ^). This model is close to isotropic, 
but has mild tangential anisotropy. Like the isotropic model, it has an approximately constant 
mass-to-number-density ratio (cf. Figure 0). The logarithmic slope of p{r) at the last data point 
{R = 1.5) equals 2.7. So even though this model is consistent with an NFW profile, the data do 
not actually allow us to test whether the mass density slope converges to 3 at large radii, as in the 
NFW profile. Figure |3| shows that the model with \og{ar / (Jt) = —0.06 has scale length a = 0.24 (as 



^Efstathiou, Ellis, & Carter (1980) studied the case of test-particles with number density u{r) oc r~'^ in a spherical 
isothermal potential. The enclosed mass M{r) is then strictly independent of 13 for all radii. The present result is 
different in that M{r) for the models studied here does depend strongly on /3 for r <^ r2oo and r ^ r2oo, but not for 
r ^ r2oo- 



before, in units of r2oo)- This is in excellent agreement with the predictions of NFW's cosmological 
simulations, which predict a = 0.20 for an fi = 0.2 open cold dark matter model and a = 0.26 for 
a flat 0, = 0.2 model (cf. C97c). The Jeans modeling therefore shows that the CNOCl data are 
consistent with both an NFW mass density profile and a constant mass-to-number-density ratio 
for clusters of galaxies, but only if the unknown velocity dispersion anisotropy has a particular 
value that is close to isotropic. Figures and show that models with other properties can fit the 
data equally well. To break this degeneracy we proceed with a more sophisticated analysis that 
uses the entire {R,v) dataset, instead of just the projected velocity dispersion profile. 



3. Distribution function models 

3.1. Model calculation 

To interpret the velocities of individual galaxies in the CNOCl ensemble cluster we need 
to construct models based on phase space distribution functions (DFs)0. As before, we restrict 
ourselves to models with constant anisotropy (3. For each /3, we fix the three-dimensional galaxy 
number density i^(r) and mass density p{r) to those calculated in §|2| from the Jeans models. 
Fixing the anisotropy /? does not uniquely determine the DF. For given u{r) and p{r) there are 
infinitely many DFs that all predict the same second order velocity moments; a DF is determined 
uniquely only after specification of all its velocity moments (e.g., Dejonghe 1986). Here we do 
not seek to derive the full set of DFs that can generate a model with a given anisotropy, but 
instead we are satisfied to find just one. To achieve this we make a simple ansatz for the DF: 
/ = ffj{E,L) = gfj{E)L~'^^ , where E and L are the binding energy and angular momentum per 
unit mass. Such models have fixed constant anisotropy /5 (e.g., Henon 1973; Kent & Gunn 1982; 
Cuddeford 1991). The problem lies in finding the function g/siE) that generates the required v{r) 
in the given p{r). The number-density profile is given by 

u{r) ^ [ f{E, L) d\ = 2^1^-^ .3/2 r~^^ ^^^^^ ^^^ 9,{E)[i:{r) - E]'/'-^ dE. (4) 

J r(| - (J) Jo 

Here ip[r) = E — ^v"^ is the relative gravitational potential, which is uniquely determined by the 
mass density distribution p{r) through Poisson's equation (e.g., Binney & Tremaine 1987). 

For each fixed value of /? we solve equation (^) for g(j{E) using the penalized likelihood 
method described in Appendix C of Magorrian & Tremaine (1999). This generally yields a solution 
for which the predicted v{r) agrees to within <^ 1% with the i^{r) inferred from the data. Only 
for radially anisotropic models with arjot <; 2 could we not achieve such good agreement. This 



^Ramirez & De Souza (1998) and Ramirez, De Souza & Schade (1999) bypassed the calculation of DFs in their 
modeling of the projected velocity distributions of clusters by making a number of simplifying assumptions, and they 
used the resulting kinematic models to draw conclusions about the velocity dispersion anisotropy of cluster galaxies 
of different morphological types. Appendix A presents a quantitative assesment of the validity of their approach and 
conclusions. 



is not due to a numerical flaw in our approach, but indicates that such radially anisotropic DFs 
cannot produce central density cusps that are as shallow as observed (Figure |l|) . This by itself can 
be taken as evidence against these models. However, the models fail only at radii r ^ 0.1, where 
the constraints from the data are not very strong (due to small-number statistics). Because of 
this, we have not excluded radially anisotropic models with arjot <; 2 from our analysis. Instead 
we use for them the best, albeit imperfect, solution to equation (H). This has little impact on our 
final conclusions, because as we will show below, even more mildly radially anisotropic models are 
already strongly ruled out by the data. 

Once the DF has been calculated, we are interested in the probability T^iR^v) Av that a 
particle observed at projected radius R is observed to have line-of-sight velocity v between v and 
v + dw. This probability is given by an integral over the DF: 

MR^^) = ^J^zJJ<iV'^dvyME,L), (5) 

where {x,y,z) is a cartesian coordinate system with the z-axis along the line of sight (i.e., the 
line-of-sight velocity v equals Vz)- For the numerical evaluation of the outer, line-of-sight, integral 
we change variables from z to u = arsinh {z/R). The evaluation of the inner integral over {vx,Vy) 
is more complicated. We transform to polar co-ordinates {w, C,) defined hy Vx ^ w cos Q and 
Vy = wsmQ. For models with /3 < we directly evaluate the resulting double integral over (w,C)- 
Models with /? > 0, however, have an awkward singularity in the DF at L = 0. For these models 
we apply an additional change of variables from C, to s = In tan -^C,. 



3.2. Data-model comparison 



We used the approach of §3J. to calculate the DFs and probability distributions J-p{R,v) for 
each of the eleven constant-/? models shown in Figure ^. With these models we study the {R, v) 
datapoints in the CNOCl cluster survey for those galaxies with K-corrected absolute Gunn r-band 
magnitude M^ < —18.5. Galaxies in the clusters MS 0906-1-11 and MS 1358-1-62 were excluded; 
these clusters are strong binaries for which spherical models cannot be appropriate (C97b). The 
sample of galaxies from the remaining 14 survey clusters was restricted to those galaxies with 
R < 1.5 and \v\ < iJmax = 4. So, without loss of generality, we ignore the regions of {R,v) space 
were cluster members are scarce among large numbers of interlopers. The remaining sample 
contains 990 galaxies. 

The sample thus defined is still expected to contain interlopers. We have not attempted any 
identification and removal of individual interlopers, but instead take interloper contamination 
into account in a statistical sense. We assume: (i) that the density of interlopers at fixed R is 
homogeneous in v; and (ii) that the probability /i for a galaxy at known R but unknown v to be 
an interloper is independent of R. With these assumptions, fi is the total fraction of interlopers in 
the sample. While this treatment of interlopers was motivated in part by mathematical simplicity, 
it does not oversimplify things to the point where the models become inadequate. We discuss the 
merits and limitations of our treatment of interlopers in detail in §[4.1| below. 
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3.2.1. Likelihood analysis 

Let galaxy number j in the sample be observed at radius Rj. The probability J^{Rj,v)dv that 
the observed velocity v of the galaxy falls between v and f + dw is given by 

T(R-,v)dv =1 ^' (dt;/2t>max) + (1 - /i) :Ff3{Rj,v)dv, \v\ < t-max ; /g^ 

■" \0, \v\ > Umax • 

In this equation T^{Rj,v) denotes the convolution of !Fp{Rj,v) with a normalized Gaussian of 
dispersion Auj , where /\vj is the formal measurement error in the determination of the velocity Vj 
of galaxy j. The velocity errors are typically <^ 140kms~^, or \/S.Vj\ ^ 0.15 in dimensionless units 
(small enough to have negligible influence on any part of our analysis). The probability J-{Rj,v) 
in equation (^) is normalized to unity, since realistic dynamical models predict !Fp{R,v) ss for 

\v\ > fmax = 4. 

The complete dataset consists of N galaxies observed at {Rj,Vj), with j = 1,...,N. 
The probability of this dataset in a given cluster model is proportional to the likelihood 
L = Ylj=i J-{Rj,Vj). Instead of maximizing the likelihood we seek to minimize the quantity^ 

N 

A = -21nL = -2^1n[J^(i?j,Wj)]. (7) 

i=i 

The goal is to calculate the likelihood quantity A on a grid of the model parameters (/?, /i), and 
to search for the best-fitting model that yields the minimum value Amin- Two further questions 
then remain to be answered: (a) is the best-fitting model statistically acceptable; and (b) what 
are the confidence regions around the best-fitting model parameters? To address the first question 
we resort to Monte-Carlo simulations. For each galaxy j in the sample we draw a velocity from 
the probability distribution T{Rj,v) for the best-fitting model. For the resulting pseudo-dataset 
we calculate the likelihood quantity A as defined by equation (0). This is repeated many times. 
From the resulting set of A values we calculate the median, as well as the confidences region 
around the median that contain a fixed fraction of the simulated A values (e.g., 68.3%, which 
corresponds to 1-a for a Gaussian distribution, or 95.4% which corresponds to 2-0"). To address 
the second question and obtain confidence regions around the best-fitting model parameters, we 
use a well-known theorem of mathematical statistics (e.g., Stuart & Ord 1991; used also by Merritt 
& Saha 1993): the likelihood-ratio statistic A — Amin tends to a x^ statistic in the limit of A^ — > oo, 
with the number of degrees-of-freedom equal to the number of free parameters that have not yet 
been varied and chosen so as to optimize the fit. Hence, the likelihood-ratio statistic A — Amin 
reduces to the well-known Ax^ statistic (e.g.. Press et al. 1992) for N —>■ oo, despite the fact that 
the J-{Rj,v) are not individually Gaussian. This is a consequence of the central limit theorem. 
In principle it would be more robust to calculate the confidence regions on the best-fitting model 
parameters (/?, /i) by means of Monte-Carlo simulation (or, e.g., bootstrapping), but in the present 
context this was found to be prohibitively expensive computationally. 
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In an ideal case in which the J-{Rj, v) are all Gaussian (not true here), A reduces to a x statistic. 
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Figure ^ shows the interloper fraction fi that minimizes A as a function of anisotropy, with 1-a 
error bars calculated as described above. The figure shows that /i is in the range /i = 0.115 it 0.02 
for all anisotropics studied, with a formal error of ±0.02 at fixed anisotropy. In the remainder of 
the paper we always use, for each given anisotropy, the optimal /i shown in Figure |5|. Figure |^ 
shows A as function of the velocity dispersion anisotropy. The overall minimum is Amin = 1783.1 
for the model with ar/cTt = 0.92 (i.e., not far from isotropic). Monte-Carlo drawing from this 
model yields a predicted A = 1766.9 it 48.6 with 68.3% confidence; hence, the observed likelihood 
is acceptable. Confidence boundaries on the best-fitting Orjot as obtained from the likelihood 
ratio statistic A — Amin are indicated in the figure. At 68.3% confidence 0.80 < (Trjot < 1.01 and 
at 95.4% confidence 0.48 < Orjat < 1.11. 

The structural properties of the ensemble cluster depend on a^jat-, as shown in Figures y 
and 0. The inner slope ^ and scale length a of the mass density, and also the mass-to-number 
density ratio p/z^, are of particular interest. Our methods do not directly allow us to obtain 
confidence intervals on these quantities. However, approximate confidence intervals are obtained 
by determining how ^, a and pjv vary in Figures |3| and Q when Orjot is varied over, e.g., its 
68.3% confidence range. This yields 0.70 < ^^ < 1.13, 0.23 < a < 0.27, and over the radial range 
0.1 < i? < 1, 0.47 < log(p/i/) < 0.65. The data are therefore consistent with an NFW profile 
and with a mass-to-number density ratio that is constant to within ~ 25% over the radial range 
0.1 < i? < 1, both at 68.3% confidence. 



3.2.2. Grand-total velocity histogram 

The likelihood is only one of many statistics that one can use to test the validity of a model. 
There are several reasons why it is useful to quantitatively consider other statistics as well. First, 
other statistics will be sensitive to different aspects of the data, and can therefore yield different 
estimates of the best model and different (possibly smaller) confidence regions. Second, there is 
no guarantee that any of our models actually provides an adequate representation of the DF that 
underlies the data (e.g., the ensemble cluster may not be perfectly spherical or have non-constant 
(3). While the likelihood suggests that the best-fit model of § |3.2.1 is statistically acceptable, other 



statistics may well indicate that none of our models are acceptable. Third, the likelihood analysis 
shows that some of our models are more likely than others, despite the fact that all models fit the 
projected velocity dispersion profile equally well. The likelihood provides little insight into why 
this is the case, and other statistics may do a better job in this respect. Based on these arguments 
we have considered some other statistics that address the properties of the grand-total velocity 
histogram for the ensemble cluster. 

The grand-total velocity distribution predicted by a given model is simply: 

1 ^ 
J'Uv) = -Y.HR3,v), (8) 

i=i 

where J-{Rj,v) is given by equation (P). However, this method of calculating the predicted 
velocity distribution for a model provides no insight into the random fluctuations caused by the 
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shot noise. The observed grand-total velocity histogram is determined by a finite number of 
galaxies, and this is best simulated in Monte-Carlo manner. For each model with given anisotropy 
we therefore draw velocities from the probability distributions J-{Rj,v) for each of the A^ galaxies 
in the sample, which yields a simulated grand-total velocity histogram. This procedure is repeated 
many times, and for each histogram bin we calculate the median occupation of the bin, and the 
range around the median that contains the occupation value for 68.3% of the simulations. 

Figure R shows the predicted, normalized grand-total velocity histograms as function of 
\v\ for three constant-/? models. In each panel the predictions are shown as a combination of 
two thin lines; for each bin, the histogram occupation falls between these lines in 68.3% of the 
simulations. The thick line in each of the panels is the observed velocity histogram. The predicted 
velocity histogram is approximately Gaussian for the isotropic model; it is more flat-topped than 
a Gaussian for tangentially anisotropic models, and more centrally peaked than a Gaussian for 
radially anisotropic models. These results are consistent with previous calculations (in other 
contexts) of the velocity distributions predicted by constant anisotropy models (e.g., Merritt 1987; 
van der Marel & Franx 1993). The observed histogram is neither particularly centrally peaked nor 



particularly flat-topped, and this is why the likelihood analysis of § 3.2.1 yields a best-fit model 
that is close to isotropic. 

One method to address whether the differences between the observed and predicted grand-total 
velocity histograms are statistically acceptable is through a x^ quantity that sums the squared 
residuals over all bins of the velocity histogram, weighted with the shot-noise errors predicted 
from the Monte-Carlo simulations. We calculated this quantity, and found a minimum x^ = 26.1 
for the model with ar/crt = 0.91. The expected value with 20 velocity bins is x^ = 20.0 it 6.6 
at 68.3% confidence, suggesting that the best fit model is acceptable. Confidence boundaries 
on ar/cTt were calculated from the usual Ax^ statistic, yielding 0.74 < ar/crt < 1.02 at 68.3% 
confidence and 0.47 < CTr/crt < 1.13 at 95.4% confidence. These results are virtually identical to 
those derived in § |3.2.1 from the likelihood statistic A. Apparently, little information is lost by 



putting all datapoints together in one grand-total velocity histogram (which removes information 
on radial dependencies). 

More insight can be gained by considering statistics that address the shape of the grand-total 
velocity histogram. By construction, the predicted histograms for models with different /3 are 
all normalized, and have the same dispersion. The most obvious shape statistics are therefore 
the kurtosis and other higher order moments (Stuart & Ord 1991). However, these moments are 
very sensitive to the wings of the velocity distribution, which are not particularly well constrained 
(primarily due to interlopers). The Gauss-Hermite moments provide more suitable statistics, since 
they are by construction insensitive to the wings of the distribution (van der Marel & Franx 1993; 
Gerhard 1993). These moments have been used before to describe cluster velocity histograms, but 
primarily as a method for searching for cluster substructure (Zabludoff et al. 1993; see also Colless 
& Dunn 1996). 

To calculate the Gauss-Hermite moments for the CNOCl ensemble cluster, the observed 
velocities were binned into a grand-total histogram .Ftot,observed(^^) as in Figure |^ The estimated 
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contribution by interlopers was then subtracted to yield a corrected histogram: 

-^tot,corrected(^^) = [•^tot,observod(l') " (/i/2t'max)] / (1 " /i), \v\ < ^max- (9) 

From this corrected histogram we calculated the two lowest order non-trivial Gauss-Hermite 
momentsQ namely /14 and /ig. The resulting moments were found to be independent of the choice 
of the bin size in the histogram construction^; hence they are well defined quantities. The best 
estimate for /; depends slightly on the assumed anisotropy (cf. Figure ^), so the inferred /14 and 
he do so as well. This is seen in Figure ^ in which dashed curves show the dependence of the 
observed /14 and /ig on the assumed anisotropy. However, the dependence on anisotropy is small, 
and we find that /14 and h^ are in the range /i4 = —0.015 it 0.005 and h^ = —0.028 it 0.006 for all 
anisotropics that we have studied. 

We also calculated /14 and /ig for the models. As before, velocity histograms were drawn from 
each model in Monte-Carlo fashion. For each ensemble of simulated histograms we calculated 
the median and 68.3% confidence interval on /i4 and hg. The resulting predictions are shown 
in Figure g as solid dots with error bars. In essence, the dots are the Gauss-Hermite moments 
predicted for the hypothetical case in which complete information on the velocity distribution 
were available at the radius of each galaxy in the sample. The error bars show the 1-a shot noise 
variations on these values due to the fact that only a discrete realization of the model is available 
with a finite number of galaxies^. These errors are \Ahi\ ~ 0.02-0.03, independent of either /? or 
the Gauss-Hermite order /|^. 

The tangentially anisotropic models predict flat-topped velocity histograms, which have 
/14 < and /i6 > 0; the radially anisotropic models predict centrally peaked velocity histograms, 
which have ^14 > and h^ < 0. There is generally more power in the fourth-order term than the 
sixth-order term. Even higher-order moments are not particularly useful in the present context, 
since all models that we have studied predict /i/ ~ to within the variations expected from shot 
noise, for all / > 8. The near-zero observed values of /i4 and hg are reproduced only by models that 
are close to isotropic. The observed values fall inside the 68.3% confidence interval for both /14 



^The best-fitting Gaussian is used to generate the Gauss-Hermite basis, so by definition ho — 1 and /12 ~ 0. The 
odd moments are all zero for a symmetrical distribution. 

^The underlying reason for this is that each Gauss-Hermite moment is sensitive to features in the velocity 
distribution on a particular characteristic velocity scale. This scale becomes finer for moments of higher order, 
as in a Fourier decomposition (Gerhard 1993). We find that velocity binning has no infiuence on the calculated 
moments, as long as the bin size of the histogram is smaller than the characteristic velocity scale for the highest order 
considered, /le in this case. 

^In the approach of Figure H, the observed Gauss-Hermite moments are fixed numbers without error bars (they 
are well defined integrals over the observed histogram), while the model predictions for the observed values have 
a shot-noise error. An alternative view is to think of the observed Gauss-Hermite moments as estimates of the 
moments of the underlying distribution. In this view, the models predict a fixed value (the dots in the figure), and 
the Monte-Carlo calculated shot-noise errors should be thought of as the errors on the observed values. 

^"The fact that the shot-noise errors are independent of the Gauss-Hermite order is due to the fact that the 
Gauss-Hermite functions form an orthonormal basis. 
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and /i6 if 1-00 < (Tr/(Tt < 1.05. They fall inside the 95.4% confidence interval for both /i4 and Hq if 



0.75 < (Tr/(Tt < 1.21. These results are broadly consistent with those derived in §3.2.1 and §3.2.2, 
but not entirely. In particular, tangentially anisotropic models are ruled out at higher confidence 
by /i4 and h^ than they are by the likelihood statistic A. 



4. Possible uncertainties and additional considerations 

The dynamical analysis in the preceding sections has yielded several interesting conclusions 
about the structure of clusters of galaxies. Although the analysis has been significantly more 
detailed than many previous studies for this and other data sets, it still involves a number of 
simplifying assumptions. The present section discusses how the approximations in the analysis 
may have affected the conclusions, and how robust the conclusions are in view of this. 



4.1. Treatment of interlopers 

Our sample definition assumes that cluster members have \v\ < Vmax = 4 in units of the 
cluster dispersion. So we identify all galaxies with observed velocities in excess of this limit as 
interlopers, similar to the approach adopted by Yahil & Vidal (1977) (they use Vmax = 3). Other 
than this very conservative cut, no removal of interlopers from the sample is attempted. This 
approach differs from that used by most authors who have modeled the dynamics of clusters of 
galaxies. Fairly complex schemes have recently been developed for the identification of interlopers, 
with estimated success rates of up to 90% (e.g., Perea, del Omo & Moles 1990; den Hartog & 
Katgert 1996). These schemes generally rely on mass estimates obtained with specific assumptions 
about the orbital structure. In the present context the orbital structure is what we hope to 
determine, so use of these schemes could possibly introduce subtle biases. Although the schemes 
are tailored to be robust and conservative, we feel it is safer in the present context to model the 
presence of interlopers in a statistical sense, rather than to try to remove them. Note that this 
modeling would still be necessary even if we did try to remove interlopers, because no interloper 
identification scheme can be 100% successful. 

In our statistical treatment of interlopers we assumed that the density of interlopers is 
homogeneous in velocity.Q_] This assumption is directly verifiable, since the density of interlopers 
at velocities |f | > i^max can be determined from the CNOCl dataset. To this end we extracted 
from the CNOCl survey as before the galaxies with K-corrected absolute Gunn r-band magnitude 
Mr < -18.5, with exclusion of galaxies in the clusters MS 0906+11 and MS 1358+62. From the 
resulting sample we calculated the density of galaxies in (R, v) space over the region defined by 



^^Note that we are not assuming that the probability for a galaxy to be an interloper is independent of v. Interlopers 
are assumed to have a homogeneous density in v, while cluster members have a density that increases strongly towards 
|ii| = (see Figure fTI). Hence, the probability for a galaxy to be an interloper is a strongly increasing function of \v\. 
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R < 1.5 and 5 < |f | < lOj^ and also over the region defined hy R < 1.5 and \v\ < Wmax = 4. 
Comparison of these densities yields a direct estimate of the interloper fraction /i for the sample 
studied in §^ This approach yields /; = 0.117, fully consistent with the values inferred from 
the likelihood analysis shown in Figure |5| (the best-fitting model identified by Figure ^ has 
/i = 0.108 lb 0.017). This provides direct evidence that the density of interlopers is similar for 
1^1 < ^max = 4 as for 5 < \v\ < 10, thus supporting the assumption underlying our analysis. This 
should also serve as a warning for interloper removal schemes, which can never identify interlopers 
with small velocities. 

The second assumption in our analysis is that the probability /i for a galaxy at known R but 
unknown v to be an interloper is independent of R. This appears somewhat counter-intuitive, 
since the surface density S(i?) of cluster members is strongly peaked towards R = 0. So if the 
surface density of interlopers Sint(i?) is homogeneous, one would expect /; to decrease towards 
R = approximately as /i oc [S(i?)]~^. By contrast, our assumption implies Sint(-R) oc ^{R), 
i.e., that the density of interlopers (with \v\ < fmax = 4) increases as steeply towards R = as 
does the density of cluster members. This may not necessarily be incorrect, because matter not 
belonging to the cluster may in fact be strongly clustered towards it. To address the validity of our 
assumption we have taken an empirical approach, by dividing the sample of §|3| in two equally sized 
subsamples of galaxies at small and large R, respectively. The first subsample contains galaxies 
with R < 0.41, and has a median radius -Rmed,i = 0.23; the second subsample contains galaxies 
with 0.42 < R < 1.5, and has a median radius -Rmed,2 = 0.73. For each subsample we redid the 
likelihood analysis, using an isotropic model. This yields interloper fractions /i^i = 0.083 it 0.024 
for the first subsample, and /i^2 = 0.125 ± 0.025 for the second subsample. So /i does appear to 
decrease with decreasing radius, but this is barely significant at the 1-a level. However, S(i?med,i) 
is ~ 5 times as large as Ti{R^ed,2)i so /i certainly does not decrease towards R = as fast as 
predicted if i;int(^) were independent of R. So although our assumption may not be fully correct, 
it does seem acceptable. 

All in all, none of the tests that we have done has given us reason to believe that our 
treatment of interlopers is significantly in error. Also, interlopers make up only a small fraction 
(~ 11%) of the sample studied in §^, and we do not expect the final results to be particularly 
sensitive to their treatment. 



4.2. Choice of distribution functions 

There are several restrictions to the dynamical analysis that we have employed. Although 
we construct phase-space distribution functions and model the full CNOCl {R,v) dataset, we do 
restrict ourselves to a specific set of models. We study only models with constant anisotropy, 
and even among the DFs that generate models with constant anisotropy, we choose a particular 



^^C97b also assumed the density of interlopers to be homogeneous in velocity space (in their velocity dispersion 
calculations), but they used a somewhat larger velocity range, 5 < l^l < 25, to determine the interloper density. This 
yields results that do not differ significantly from those obtained here. 
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set. Realistic systems are unlikely to have a velocity dispersion anisotropy that is independent of 
radius, and it is not a priori clear whether our results can provide any information on what range 
of such models could be acceptable. 

Despite the shortcomings of our analysis, there are reasons to believe that our results may 
be more robust than they would otherwise seem. In a nutshell, our analysis boils down to the 
fact that the observed grand-total velocity histogram for the CNOCl sample is close to Gaussian, 
and this is not generally expected for models that have significant velocity anisotropy. Van der 
Marel & Franx (1993) presented a simple method for calculating the Gauss-Hermite moments for 
constant-/? models with various scale-free number densities and potentials. Such models show 
that the Gauss-Hermite moments are determined primarily by the overall velocity dispersion 
anisotropy, and to much smaller extent by the details of the number density profile and potential. 
Gerhard (1993) reached the same conclusion for models with varying /3. So even though we 
have parameterized J^(r'), p{r) and (3 in our approach, it is likely that even with a more general 
non-parametric approach the observations would still imply a velocity distribution that is close to 
isotropic. 

The approach that we have used shares many similarities to that used by Merritt &: Saha 
(1993) to interpret 296 measured velocities for the Coma cluster. In particular, we follow their 
approach of maximizing the likelihood for the (i?, v) dataset. Like us, they use a restricted 
parameterization for the mass profile; while they adopt a two-parameter family for the gravitational 
potential, we adopt a three-parameter family for the mass density. The technique of Merritt & 
Saha does have two important advantages over our approach: first, it makes a basis function 
expansion of the DF that allows fairly arbitrary anisotropy profiles; and second, it avoids binning 
and parameterization of the projected number density profile. On the other hand, our method 
expands on that of Merritt & Saha through the inclusion of explicit modeling of interloper 
contamination, the Monte-Carlo simulation of the dataset to estimate confidence regions and the 
goodness-of-fit, and the use of Gauss-Hermite moments. A hybrid version of our techniques would 
probably allow the most robust conclusions to be reached, but this is beyond the scope of the 
present paper. 



4.3. Brightest Cluster Galaxies 

Our analysis has included the brightest cluster galaxies (BCGs) of each of the clusters in 
the sample. BCGs have very special properties; they tend to be atypically bright, and they are 
generally found at (i?, v) ~ (0, 0). So it is not clear whether it is appropriate to treat them on a par 
with the other cluster members, as we have done. Exclusion of the BCGs from our sample would, 
among other things, decrease the projected number density S(i?) at small radii. Specification of 
S(-R) is the first step in our modeling approach, so our entire analysis would need to be repeated 
to determine accurately whether exclusion of the BCG's would alter the results of our dynamical 
models. To avoid this, we have chosen a more approximate route to address this issue. We 
removed the BCG's from the sample, and then recalculated the Gauss-Hermite moments of the 
observed grand-total velocity histogram. The BCGs tend to have u ~ 0, so their removal makes 
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the histogram more flat-topped. We find that /i4 and /ig for the resulting sample are in the range 
/14 = —0.024 lb 0.005 and h^ = —0.022 it 0.006 independent of the assumed anisotropy (which, as 
in Figure & influences the Gauss-Hermite moments through the best-fitting interloper fraction 



/i). As discussed in §^3, the Gauss-Hermite moments predicted by dynamical models depend 
primarily on the anisotropy, and to a much lesser extent on the number density profile. So it may 
be reasonable to use the theoretical relation between h^, Hq and anisotropy in Figure ^ despite 
the fact that it was derived for a somewhat different S(i?). The observed estimates of /i4 and Hq 
without BCGs then imply best-fitting models that are somewhat more tangentially anisotropic 
than those derived in §|3|, but only by Alog(crr/o"i) = —0.03. This change is small compared to the 
size of the confidence regions on the best-fitting models. Hence, the inclusion or removal of BCGs 
has no significant effect on the main conclusions from our analysis. 



4.4. Non-sphericity 

Clusters of galaxies are generally flattened. The mode of the distribution of projected axial 
ratios for clusters is (7 ~ 0.6, while the mode of the distribution of intrinsic axial ratios is g ~ 0.45 
(de Theije, Katgert, & van Kampen 1995). It is important to know what this implies for the 
validity of our spherical modeling. 



4-4- 1- Grand-total velocity distribution 

We consider as a simple test case axisymmetric clusters of fixed intrinsic axial ratio q, with 
number density profiles of the form studied by Hernquist (1990), 

u{R,z) = —m-^ (l+m)-'-^, rn^ = R^ + {z^ / q^) , (10) 

2'i\:q 

and constant mass-to- number density ratios p/u = 1. These models have scale length a = 1 and 
total mass M = 1. We restrict ourself to the simplest type of axisymmetric systems, namely those 
in which the DF depends only on the two classical integrals of motion, / = f{E, Lz). We denote 
by !Fq{R, rj; v; i) dv the probability that a star at position {R,ri) in a polar coordinate system on 
the sky, residing in a system of axial ratio q that is viewed at inclination i, is observed to have 
line-of-sight velocity v between v and v + dv. The probability distributions J-'q can be conveniently 
calculated for two-integral models using, e.g., the method of Magorrian & Binney (1994). The 
grand-total velocity probability distribution for a system of given q and i is given by 

J'g{v;i)= f^g{R,7];i)J^q{R,rj;v;i)RdRd7] j f ^g{R,7];i) RdRdr], (11) 



where Tiq{R, tj; i) is the projected number density obtained when the ^{R, z) given by equation ( |10[] 
is viewed at the given inclination, and where the integrals extend over the two-dimensional plane 
of the sky. We denote by crq{i) the velocity dispersion corresponding to the distribution J^q{v;i). 
We consider now the case of a large number of identical axisymmetric clusters of fixed intrinsic 
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axial ratio q, viewed from random viewing directions. From these clusters we build an ensemble 
cluster, as we have done for the CNOCl data set. The grand-total velocity probability distribution 
of the ensemble cluster is given by 

Tq{v) = [J^q{v/aq{i);i)/aq{i)] sinidi / sinidi, (12) 

where the integral is over all inclination angles i £ [0, 7r/2], and where v is now a dimensionless 
velocity. 

The velocity anisotropy of an f{E, L^) model with fixed axial ratio q can be characterized by 
the quantity 

r 1 1/2 

(a.M) EE [2{vl) / {{vD + {vl))] , (13) 

where the angle brackets denote mass-weighted averages over the system. For a spherical system 
with constant anisotropy, {a.,. /at) reduces to the same quantity (Jr/crt that we have used to 
characterize anisotropy in, e.g.. Figures ^, |5|, ^ and |8[ For an axisymmetric / = f{E,Lz) system, 
{ur/'^t) is determined uniquely by the axial ratio q, according to the tensor virial theorem (Binney 
& Tremaine 1987). For an oblate two-integral system. 



k^rl(yt)c 



( I ^arcsincN / / 1 arcsine 



1/2 



(14) 



where the eccentricity is defined by e^ = 1 — g^ 



For several values of the axial ratio ranging from q = 0.3 to g = 1 we calculated the velocity 
distributions J'q{v) as pertaining to an ensemble of randomly oriented clusters. In the integrals 
of equation ( pT[) we included only radii R E [0.1; 3] to approximate the range of radii for which 
data is available in the CNOCl data set. From each distribution we calculated the corresponding 
fourth-order Gauss-Hermite moments /i4^g, which describe the extent to which the distributions 
deviate from Gaussians, and for each axial ratio we also calculated the velocity anisotropy 
parameter [ar/atjq- The dots in Figure ^ connected by a solid curve show h^^^q as function of 
{(Jr/(Tt)q- For comparison, we also calculated the relation between h^, and (Tr/ot for spherical 



constant-/? models with a Hernquist density profile, using the approach of §3.1. Those results 
are shown as a dashed curve in Figure pi The results of these two very different calculations are 
virtually identical. 



4-4-^- Radial dependence of model quantities 

Figure ^ suggests that it is reasonable to model the grand-total velocity distribution of an 
ensemble of axisymmetric clusters with a spherical model. However, for a spherical model to be 
appropriate it must also predict the correct radial dependence for all observable quantities. The 
projected intensity profile of an ensemble of axisymmetric clusters is given by 



^q{R) = IT- ^q{R,r];i)dr] sinidi, (15) 
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where the integrals are over rj G [0, 27r] and i £ [0,7r/2]. The radial profiles of the kinematical 
quantities are obtained by evaluating equation ( |Tl[ ) without the integrals over R dR, followed by 
integration over inclinations as in equation ([l^). This yields velocity probability distributions 
from which one can calculate the run of the velocity dispersion and the Gauss-Hermite moment 
/14. We did these calculations for an ensemble of two- integral Hernquist models with axial ratio 



q = 0.6; the solid curves in Figure IC show the resulting profiles. 



We used the approach of §3.1 to calculate a constant-/? DF for the spherical model that has 
the same projected intensity profile as the ensemble of axisymmetric clusters, with arjot fixed to 
the value {prjotj = 0.81 that applies to an f{E,Lz) model with q = 0.6. The dashed curves in 



the bottom two panels of Figure 10 show the resulting (t{R) and h/^{R). The profiles are closely 
similar to those for the ensemble cluster, with residuals | A logo"] <^ 0.04 and IA/14I <^ 0.02 at 
all radii of interest. The projected intensity profile of the ensemble cluster is almost identical 
(residuals |AS| ^ 0.01) to the projected intensity profile of a spherical Hernquist model with a 
scale length a for which log a = —0.046. This also differs only slightly from the value (log a = 0) 
for the axisymmetric clusters from which the ensemble was constructed. 



The curves for the ensemble in Figure IC were obtained without scaling each cluster with its 



individual velocity dispersion (Jq{i) in equation (12). The normalizations of (j{R) for the ensemble 



cluster and the spherical model are therefore similar in an absolute senscfj This is important for 
cluster mass determinations, since M oc cr^. It implies that the average mass (or mass-to- light 
ratio) of a set of clusters can be adequately determined from spherical models, even if the clusters 
are individually axisymmetric. 



4.4-3- Dependence on axial ratio and anisotropy 

The f{E,Lz) models discussed so far provide only one possible dynamical structure for 
axisymmetric systems. In general, the construction of DFs for axisymmetric systems is a 
complicated problem (e.g., van der Marel et al. 1998; Cretton et al. 1999) that is beyond the scope 
of the present paper. However, if we restrict ourselves to the case of asymptotically large radii 
then the problem becomes more tractable. An axisymmetric Hernquist model then reduces to a 
scale-free spheroidal mass density in a Kepler potential. Two families of three-integral DFs for 
this limiting case were presented by de Bruijne et al. (1996). Each family has one free parameter 
called /3. In the spherical limit both families reduce to the spherical constant-/? models that we 



have discussed in §3.1, but in the more general axisymmetric case the families have different 
properties. The 'case I' DFs for a given axial ratio are generalizations of the f{E,Lz) models; 
they reduce to the f{E,Lz) model for /? = 0. The 'case II' DFs for a given axial ratio correspond 
to models in which the velocity anisotropy (3 = 1 — cf /a"^ is constant throughout the system, 
whereas the anisotropies (Je/ar and a^ja^ are not. The case II DFs have a particularly interesting 



^''it was verified that the agreement between u(K) and h^{R) for the ensemble cluster and the spherical model does 
not deteriorate when one does include the scaling of clusters with their individual velocity dispersion (as we have 
done for the CNOCl dataset). 
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property in the present context. If one constructs an ensemble of axisymmetric models with this 
DF, then the entire projected velocity distribution of this ensemble (i.e., the velocity dispersion 
and all Gauss-Hermite moments) is identical to that for a spherical model with the same (5. This 
is true for any axial ratio q and velocity anisotropy (5. So if clusters of galaxies are axisymmetric 
with DFs of this type, then the construction of spherical models for an ensemble of such clusters 
will yield exactly the correct results (at large radii), even for extreme axial ratios or anisotropies. 

In general, the differences between the predictions of an ensemble of axisymmetric systems 
and a corresponding spherical model will depend on the details of the shapes and DFs of clusters 
of galaxies, which are not known. However, the combined results of all the tests that we have 
done indicate that the agreement in all quantities of interest is generally quite good. Hence, it 
appears adequate to use spherical models to interpret data for an ensemble of clusters that may 
individually not be spherical. 



4.5. Homology 

The CNOCl ensemble cluster that we have analyzed was constructed under the assumption 
that clusters form a homologous set. This allowed us to scale all data in both radius and velocity. 
The virial theorem for a collisionless system states that a'^ = GM/vg, where a is the grand-total 
velocity dispersion of the system, M is its mass, and rg is the gravitational radius (Binney & 
Tremaine 1987). To properly scale the data for a homologous set of clusters, one would need to 
scale all velocities by a and all radii by rg. However, rg = GM'^/W is defined in terms of the 
potential energy W of the system, which itself depends on the exact radial profile of the mass 
density (Binney & Tremaine 1987). This is what we wish to determine, and rg is therefore not 
known a priori for individual clusters. Lacking knowledge of rg, we scaled all radii with r2oo when 
building the CNOCl ensemble cluster in §^. 

One may wonder whether the scaling of the data with r2oo rnay have biased the results of our 



analysis in some way. The likely answer is that it hasn't. In §3.2.2 we compared model predictions 
to the grand-total CNOCl velocity histogram. This histogram is entirely independent of the 
radial scaling adopted in the construction of the ensemble cluster; it only depends on the velocity 
scaling. Nonetheless, the results obtained from the grand-total CNOCl velocity histogram are 
in good agreement with those obtained from the more complete likelihood analysis, which does 
depend on the details of the radial scaling adopted in the building of the ensemble cluster. This 
indicates that our results are robust, and not particularly sensitive to the adopted radial scaling. 

The radii r2oo for the CNOCl clusters were calculated from the relation r2oo = V^cr/10H{z), 
where H(z) is the Hubble constant at redshift z (C97b). To assess the infiuence of the radial 
scaling further we did some tests with ensembles of hypothetical clusters that obey a relation of 
the form i? oc cr^ with r 7^ 1. These were then analyzed under the assumption R (x a that was 
made in the analysis of the CNOCl data. These tests confirmed that the results of our analysis 
are not very sensitive to possible errors in the adopted radial scaling, for reasonable values of r 
(Section 3.1 of Schaeffer et al. 1993 indicates r ^ 1.15 it 0.3, so the scaling adopted for the CNOCl 
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dataset is itself not inconsistent with the data) . 

There is some evidence for the fact that clusters may form a homologous set from the fact that 
their global properties appear to lie on a fundamental plane, as for elliptical galaxies (Schaeffer 
et al. 1993). However, it is still quite possible that clusters are in fact not homologous at all. If 
so, our results would only (at best) be valid in some average sense. For example, our conclusion 
that the CNOCl ensemble cluster is consistent with an isotropic velocity distribution may well be 
consistent with the (contrived) hypothesis that half of the CNOCl clusters are strongly radially 
anisotropic and the other half are strongly tangentially anisotropic. 

The possible presence of cluster substructure may be another cause of non-homology. 
However, the two clusters in which substructure is most readily apparent, MS 0906+11 and MS 
1358+62, were excluded from the analysis. X-ray images for the remaining clusters appear mostly 
regular, and dynamical and X-ray mass estimates agree well (Lewis et al. 1999); this argues against 
significant substructure. Also, the procedure of adding individual clusters together minimizes the 
effect of any possible substructure in the dynamical analysis. 



5. Discussion and conclusions 

In the context of the CNOCl cluster survey, redshifts were obtained for galaxies in 16 clusters 
at z = 0.17-0.55, selected on the basis of their X-ray luminosity (e.g., C96). The resulting sample 
is ideally suited for an analysis of the internal velocity and mass distribution of clusters of galaxies. 
Previous analyses of this dataset were based on Jeans equation models for the projected velocity 
dispersion profile cr(i?) (C97b,c). The results of such models always have a strong degeneracy 
between the mass density profile p{r) and the velocity dispersion anisotropy profile /5(r), because 
these two functions of one variable are only constrained by one function of one variable. Here we 
have attempted to break this degeneracy by analyzing not only (t{R), but the full {R,v) dataset of 
the CNOCl cluster survey. The sample consists of 990 galaxies with R < 1.5 and \v\ < fmax = 4 
in dimensionless units. 

In §|2| we repeated some of the previous Jeans modeling, although with a somewhat different 
approach from that employed by C97b,c. The results provide an illustration of the degeneracies 
involved in the modeling, and serve as a useful starting point for a more detailed analysis. In §^ 
we presented an analysis of the full {R,v) dataset, using a one-parameter family of models with 
different constant velocity dispersion anisotropy, each using a different mass density profile p{r) 
and providing the same acceptable fit to a{R). The best-fit model was sought using a variety 
of statistics, including the likelihood of the dataset, and the Gauss-Hermite moments of the 
grand-total velocity histogram. The confidence regions and goodness-of-fit for the best-fit model 
were determined using Monte-Carlo simulations. Although the results differ slightly depending on 
which statistic is used, all statistics agree that best-fit model is close to isotropic. The isotropic 
model is acceptable at the l-cr confidence level for all statistics used. For none of the statistics 
does the 1-a confidence region extend below Ur/at = 0.74, or above (Jr/crt = 1.05. 

Cosmological A^-body simulations for galaxy clusters generally predict velocity distributions 
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that are isotropic near the center, and that become somewhat radially anisotropic towards ~ r2oo 
(e.g., Crone et al. 1994; Cole & Lacey 1996; Ghigna et al. 1998). The maximum radial anisotropy 
is not large, (Jj./<tj ~ 1.3. The number density weighted anisotropy over the region sampled by the 
CNOCl data is 0^1 Ot ~ 1.1. Although this value is only allowed by our analysis at the 2-cr level, 
theory and observations do seem to agree that the velocity distribution of galaxy clusters is not 
strongly anisotropic. 

NFW have argued that dark matter halos have a universal mass density profile characterized 
by three main parameters: an inner power-law slope ^ = 1; an outer power-law slope 3; and a 
characteristic scale a where the profile changes from its inner to its outer slope, a ~ 0.2-0.3 in 
units of r2oo (depending somewhat on the adopted cosmology; cf. C97c). In our models we have 
parameterized the cluster mass density similarly as NFW, but with ^ and a as free parameters. 
Models with different velocity dispersion anisotropy require very different values of ^ and a to 
fit the projected velocity dispersion profile. Only models that are close to isotropic have ^ ~ 1 
(cf. Figure ^, and our analysis of the full {R,v) dataset shows that such models are in fact the 
only ones that provide a statistically acceptable fit to the data. Such models have a ^ 0.24, 
consistent with the predictions of NFW. For these models the logarithmic slope of p{r) at the last 
data point (R = 1.5) is ~ 2.7, so it cannot be established whether the mass density slope actually 
converges to 3 at large radii, as predicted by NFW. These models have an approximately constant 
mass-to-number density ratio, by contrast to the strongly anisotropic models that are inconsistent 
with the data (cf. Figure ^). 

In §Q we have discussed a number of possible uncertainties in our analysis, including our 
treatment of interlopers and BCGs, our use of a restricted one-parameter family of distribution 
functions, our use of spherical models for what is in reality an ensemble of non-spherical clusters, 
and our assumption that clusters form a homologous set. The discussions and tests that we 
have presented on these issues have not provided us with any serious reasons to mistrust our 
results. Nonetheless, it remains true that there are a number of important approximations in 
our treatment. Until more detailed models are constructed, it will remain difficult to fully assess 
all implications of the assumptions in our analysis. Possibly the most important caveat in our 
results remains the fact that we have studied an ensemble cluster, built by co- adding data from 
14 individual clusters under the assumption of homology. This has the advantage of reducing 
the influence of substructure and non-sphericity in individual clusters on the final dataset, but if 
clusters are not a homologous set, then our results will (at best) only be valid in an average sense. 

To conclude, we have presented evidence that suggests that clusters of galaxies have 
approximately: (i) an isotropic velocity distribution; (ii) a mass density profile as predicted 
by NFW; and (iii) a mass-to-number density ratio that is constant with radius. At the very 
least, it has been shown that these properties are not inconsistent with the CNOCl survey 
data. Additional weight is added to these conclusions by the fact that they are consistent with 
the preliminary results from a weak-lensing study of a set of 10 galaxy clusters with the Big 
Throughput Camera at CTIO. That study also yields mass density profiles that are well fit by 
the NFW parameterization and mass-to-light ratios that are approximately constant with radius 
(Deir Antonio et al. 2000). 
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A. The accuracy of simple kinematic models for cluster velocity histograms 

Ramirez &: De Souza (1998) and Ramirez, de Souza & Schade (1999) recently presented an 
alternative method to model cluster velocity histograms. They studied a total of 21 clusters 
(including nine that were observed in the context of the CNOCl survey) to constrain the orbital 
anisotropy of galaxies of different morphological types. The deviations of the cluster velocity 
histograms from Gaussians were quantified using the higher-order statistic \u\ = (l/N) J2j=i l^i/^l 
(this can be viewed as an alternative to using the Gauss-Hermite moments or kurtosis). To 
interpret the observed values of |u| they were compared to the predictions of a simple kinematical 
model that assumes that: (i) the velocity dispersions aj-, erg and ci,^ are constant through the 
system; and (ii) the velocity distribution is a (three-dimensional) Gaussian at any point in the 
system. These assumptions remove the laws of gravitational dynamics (both Poisson's equation 
and the collisionless Boltzmann equation) from the problem, as well as any information contained 
in (and dependence on) the galaxy number density profile z^(r), the mass-density profile p(r) and 
the projected velocity dispersion profile a{R). Instead, the shape of the cluster velocity histogram 
in these models is a simple unique function of only one parameter, arjot- The assumptions on 
which these models are based are not generally correct, but Ramirez et al. argue that they are 
reasonable and adequate for the problem at hand. As we will show, this is not actually the case. 

The primary shortcoming of the Ramirez et al. models is the assumption of locally Gaussian 
velocity distributions. This is most easily seen for the case of very tangentially anisotropic 
models, which have large numbers of galaxies on (nearly) circular orbits. This leads to local 
velocity distributions that are strongly bimodal, with peaks at plus and minus the local circular 
velocity. Even after projection, such models often predict velocity distributions with pronounced 
double peaks (e.g., van der Marel &: Franx 1993). This behavior of real dynamical systems is not 
reproduced by kinematical models that assume locally Gaussian velocity distributions. In such 
models the line-of-sight velocity distribution is unimodal everywhere along the line of sight, and 
hence the same is always true for the projected velocity distribution. 

Calculations confirm these arguments and quantify the size of the errors that are introduced. 
As an example, we considered models for the CNOCl ensemble cluster in which the velocity 
dispersion profiles o"r(r) and o"i(r) were calculated from the Jeans equations (as in §§), but in 
which the velocity distribution along the line-of-sight at any point in the system was incorrectly 
assumed to be a Gaussian (with dispersion given by the Jeans equations), instead of being 
calculated from a DF (as in §|3|). The predicted grand-total velocity histograms were calculated for 
different values of a^jat^ and their shapes were quantified through the Gauss-Hermite moments 
/i4 and /iQ (dashed curves in Figure |l^). The results can be compared to those obtained with 
a self-consistent dynamical model based on a DF (solid curves in the same figure; same as the 
predictions in Figure ^). It is clear that the results obtained with the simplified analysis have 
little in common with the self-consistent results. The former always yields histograms that are 
more centrally peaked than a Gaussian (/14 > 0), and only isotropic models produce histograms 
that are close to Gaussian (/14 ~ 0). Ramirez et al. find the same general behavior. By contrast, 
self-consistent dynamical models predict a smooth transition from fiat-topped to centrally-peaked 
profiles when going from tangential anisotropy to radial anisotropy. So while the data presented 
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by Ramirez et al. on the velocity histogram shapes for galaxies of different morphological types 
are very interesting and definitely worth further study, it appears that their kinematical models 
are insufficient to obtain reliable conclusions about orbital anisotropics. 
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Fig. 1. — The projected galaxy number density profile S(i?) of the CNOCl ensemble cluster as 
derived in C97b. The curve shows the parameterized fit given by equation (|2|), which is used in the 
dynamical modeling. The units along the axes are dimensionless, as described in the text. 
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Fig. 2. — The projected velocity dispersion profile cr{R) of the CNOCl ensemble cluster. A 101- 
point running average is shown, sampled at intervals of 0.1 in the dimensionless projected radius R. 
The curves show the predictions of eleven models, each with different constant velocity dispersion 
anisotropy /3. The models all provide similar, statistically acceptable fits to the data, but each uses 
a different mass density profile p{r). 
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Fig. 3. — Parameters of the three-dimensional mass density p{r) displayed in Figure ^, as function 
of the velocity dispersion anisotropy Ur/at-, with from top to bottom: the scale density po; the 
scale radius a; and the central power-law slope ^. Solid dots indicate the values of arjot for which 
detailed models are constructed in this paper. 
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Fig. 4. — Inferred structure of the CNOCl ensemble cluster, with from top left to bottom right: 
the three-dimensional galaxy galaxy number density ^{r); the three-dimensional mass density p{r); 
the mass-to-number-density ratio p/v[r); and the enclosed mass M{r). The galaxy number density 
v{r) is obtained by deprojection of the projected galaxy number density S(i?) shown in Figure |l|. 
The mass density p{r) and enclosed mass M{r) are obtained by solving the Jeans equation for a 
spherical system so as to best fit the projected velocity dispersion profile shown in Figure |2[ Eleven 
curves are shown, each indicating the best fit for a different constant velocity dispersion anisotropy 
/3. The /?- values of the models are logarithmically spaced in Orjot- Heavy curves are for the best- 
fitting isotropic model, short dashed curves are for the best-fitting model with arjot = 1/3, and 
long dashed curves are for the best-fitting model with ar/crt = 3. The radial range shown along the 
abscissa corresponds approximately to the range for which the models are meaningfully constrained 
by the kinematical data. 
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Fig. 5. — Inferred fraction f\ of interloper galaxies in the sample with 1-cr error bars, as function 
of the velocity dispersion anisotropy ar/(Jt- 
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Fig. 6. — The likelihood quantity A defined in equation (|^) as function of the velocity dispersion 
anisotropy ar/crt- Solid points show models that were calculated; the curve is a spline fit through 
the points. The best fit model has o-rjat = 0.92, and is close to isotropic. The 68.3% and 95.4% 
confidence boundaries are indicated, as inferred from the likelihood-ratio statistic A — Amin- 
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Fig. 7. — The heavy curve in each panel is the normahzed grand-total velocity histogram for 
the CNOCl ensemble cluster, as function of \v\. The regions between the two thin curves are 
the predictions of models with, from left to right, CTr/crt = 0.52, 1.00 and 1.93. The model 
predictions take into account the shot noise due to the finite number of galaxies. In Monte-Carlo 
simulations the occupancy in each bin falls between the two thin curves in 68.3% of the drawings. 
Tangentially anisotropic models yield histograms that are more flat-topped than a Gaussian while 
radially anisotropic models yield histograms that are more centrally peaked. Of the models that 
are displayed, the isotropic model (middle panel) provides the best fit to the data. 



34 



-* 

^ 



CD 



.15 

.1 

.05 



-.05 

-.1 

-.15 

.15 

.1 

.05 



-.05 

-.1 

-.15 



iJ 



1 predict( 



i 



l 



\ 



M 



observed 



:dicted 



^ J i T 

— observed ^ ~ j • 



4 



-.4 -.2 ^,2 .4 

log((Jr/(Jt; 



Fig. 8. — The Gauss-Hermite moments h^ and Hq of the grand-total velocity histogram for the 
CNOCl ensemble cluster, as function of the velocity dispersion anisotropy Orjot- The contribution 
from interlopers was subtracted as described in the text. The dashed curves indicate the values 
calculated for the observed dataset; these values depend mildly on the assumed anisotropy, because 
the best estimate for the fraction of interloper galaxies in the sample does. The solid points indicate 
the predictions of the dynamical models; the error bars are the 68.3% confidence regions calculated 
from Monte-Carlo simulations which take into account the finite number of galaxies in the sample. 
Only models that are close to isotropic provide an acceptable fit to both /14 and /ig- 
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Fig. 9. — Dots connected by a solid curve are the properties calculated for an ensemble of 
axisymmetric two-integral Hernquist models of fixed axial ratio g, seen from random viewing 
directions. The Gauss-Hermite moment /14 refers to the grand-total ensemble velocity histogram, 



and (o"j./(Tt) is the velocity anisotropy defined in equation (13) in terms of mass- weighted averages 
over the system. The dots are for q = 0.3, 0.4, 0.6, 0.8 and 1.0, from left to right respectively. For 
comparison, the dashed curve shows the relation between /i4 and anisotropy for spherical constant- 
/3 Hernquist models. The close similarity between the results shows that it is reasonable to use 
spherical models to interpret data for an ensemble of clusters that may individually not be spherical. 
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Fig. 10. — Solid curves show, from top to bottom, the projected intensity, velocity dispersion 
and Gauss-Herniite moment /i4 as function of radius for an ensemble of axisymmetric two-integral 
Hernquist models of fixed axial ratio q = 0.6, seen from random viewing directions. For comparison, 
the dashed curves in the bottom two panels show the predictions for a spherical constant-/? 
Hernquist model with the same projected intensity profile, and the same overall anisotropy {ar/ut)- 
The close similarity between the results shows that it is reasonable to use spherical models to 
interpret data for an ensemble of clusters that may individually not be spherical. 
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Fig. 11. — Gauss-Hermite moments /14 and /ig of the grand-total velocity histogram for the CNOCl 
ensemble cluster, as function of the velocity dispersion anisotropy o^-jut- Solid curves show the 
predictions obtained with the DF modeling approach of §|3| (same as the predictions in Figure |^, 
but without the shot-noise related error bars). Dashed curves show the predictions obtained when 
one incorrectly assumes that the velocity distribution is Gaussian everywhere along the line of sight, 
as described in Appendix A. The latter assumption, which was employed in papers by Ramirez et 
al., does not yield the correct results. 



